DNA methylation enables recurrent endogenization of giant viruses in an animal relative

5-Methylcytosine (5mC) is a widespread silencing mechanism that controls genomic parasites. In eukaryotes, 5mC has gained complex roles in gene regulation beyond parasite control, yet 5mC has also been lost in many lineages. The causes for 5mC retention and its genomic consequences are still poorly understood. Here, we show that the protist closely related to animals Amoebidium appalachense features both transposon and gene body methylation, a pattern reminiscent of invertebrates and plants. Unexpectedly, hypermethylated genomic regions in Amoebidium derive from viral insertions, including hundreds of endogenized giant viruses, contributing 14% of the proteome. Using a combination of inhibitors and genomic assays, we demonstrate that 5mC silences these giant virus insertions. Moreover, alternative Amoebidium isolates show polymorphic giant virus insertions, highlighting a dynamic process of infection, endogenization, and purging. Our results indicate that 5mC is critical for the controlled coexistence of newly acquired viral DNA into eukaryotic genomes, making Amoebidium a unique model to understand the hybrid origins of eukaryotic DNA.


Fig. S2 .
Fig. S2.Genome assembly, completeness and repeat landscape in Amoebidium.(A) Heatmap representing the micro-C contact map across and within Amoebidium chromosomes, with a

Fig
Fig. S3.Trinucleotide context influences CG methylation distribution in Amoebidium.(A) Genome wide methylation levels on CG dinucleotides surrounded by all possible base compositions measured by Enzymatic Methyl-seq.(B) Regional methylation across genes, promoters, repeats and viral insertions divided by trinucleotide context.(C) non-CGC/GCG methylation distribution on TEs classified by age.TE insertions with Kimura distances spanning 5% differences are shown from left to right.Centre lines in boxplots are the median, box is the interquartile range

Fig. S4 .
Fig. S4.The three major types of giant repeats in the Amoebidium genome.(A) Genome browser snapshot of a large GEVE in chromosome 12.Coloured genes highlight various PFAM containing Open Reading Frames.(B) Genome snapshot of an Adintovirus insertion spanning 29 kb, flanked by host protein coding genes.mCP: minor capsid protein, MCP: major capsid protein.(C) Plavaka-associated giant repeat spanning 42 kb.Genes are colour coded according to the taxonomy of the first hits against the NCBI non-redundant protein database.

Fig. S5 .
Fig. S5.Phylogenetic alliances of Amoebidium endogenized viruses.(A) Maximum-likelihood phylogeny of the NCLDV major capsid protein (PF04451) found in the Giant Virus Database, including select GEVEs.Amoebidium sequences are found in the red clade embedded in the Pandoravirales clade.Potentially endogenized capsid proteins in the filastereans Pigoraptor chileana/vietnamica, algae Klebsormidium nitens and the amoebozoan Acanthamoeba castellanni displayed with coloured dots.(B) Maximum-likelihood phylogeny of the VLTF3 (Pox_VLTF3, PF04947) genes found in the Giant Virus database and select eukaryotes.Amoebidium GEVEs are shown in a red branch sister group to Pandoravirales, including Medusavirus.Amoebidium adintoviruses are shown in purple, no other polinton-like viruses are shown as these do not normally encode VLTF3, which was probably acquired from a giant virus.(C)Amino acid maximum-likelihood phylogeny of a widely distributed Orthogroup (OG000007) across GEVEs, and major capsid proteins with outgroups from panel A. Each GEVE insertion is named as GV[number], and their chromosome of origin.Dotted lines connect insertions that encode both ORFs.(D) Maximum-likelihood phylogeny of DNA polymerase type B ORF from adintoviruses, adenoviruses and virophages from (39).(E) Nucleotide maximumlikelihood phylogeny of full-length adintovirus insertions.These are labelled with numbers reflecting chromosome of origin.Bars depict the substitutions per site in all phylogenetic trees.

Fig. S9 .
Fig. S9.Genomes and endogenized viral phylogenies of Amoebidium isolates.(A) Maximum likelihood phylogenetic tree displaying the nuclear ribosomal 18S RNA phylogeny of Amoebidium isolates and closely related Ichthyophonus and Paramoebidium genus.The Isolate 9181 (orange) 18S is 100% identical to A. appalachense in NCBI and Isolate 9257 (blue) 18S is 100% identical to A. parasiticum in NCBI.(B) Genome size, repeat distribution, BUSCO completeness and global CG methylation levels across the 3 Amoebidium genomes.(C) Maximum likelihood phylogeny of VLTF3 encoded in giant virus and adintovirus endogenization events.Dots represent the genome they come from following panel Fig. 4A colour code.(D) Maximum likelihood phylogeny of DNA Polymerase family B encoded in adintoviruses.Coloured dots represent genome of origin.(E) Dot plot showing the genome alignment of contig level assemblies of Isolate 9181 and 9257 compared to the chromosome scale A. appalachense reference genome.Identity of alignment windows is calculated with minimap2 (with the D-GENIES server).